Efficient Approximate Top-k Query Algorithm Using Cube Index
نویسندگان
چکیده
Exact top-k query processing has caught much attention recently because of its wide use in many research areas. Since missing the truly best answers is inherent and unavoidable due to the user’s subjective judgment, and the cost of processing exact top-k queries is highly expensive for datasets with huge volume, it is intriguing to answer approximate top-k query instead. In this paper, we define a novel kind of approximate top-k query, called approximation top-k query, and introduce an efficient indexing structure, cube index, to support this query. Based on cube index, we propose our novel algorithm: Cube Index Algorithm (CIA). We analyze the complexity of both setting up -cube index and CIA algorithm. Moreover, extensive experiments show that the CIA has significant improvement on the performance, compared with the well-known approximate top-k query algorithm, TA algorithm.
منابع مشابه
Keyword Search in Text Cube: Finding Top-k Relevant Cells
We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell...
متن کاملDynamic Update Cube for Range-sum Queries
A range-sum query is very popular and becomes important in finding trends and in discovering relationships between attributes in diverse database applications. It sums over the selected cells of an OLAP data cube where target cells are decided by the specified query ranges. The direct method to access the data cube itself forces too many cells to be accessed, therefore it incurs a severe overhe...
متن کاملAn Efficient and Versatile Query Engine for TopX Search
This paper presents a novel engine, coined TopX, for efficient ranked retrieval of XML documents over semistructured but nonschematic data collections. The algorithm follows the paradigm of threshold algorithms for top-k query processing with a focus on inexpensive sequential accesses to index lists and only a few judiciously scheduled random accesses. The difficulties in applying the existing ...
متن کاملEfficient Top-K Query Algorithms Using Density Index
Top-k query has been widely studied recently in many applied fields. Fagin et al. [3] proposed an efficient algorithm, the Threshold Algorithm (i.e. TA), to process top-k queries. However, in many cases, TA does not terminate even if the final top-k results have been found for some time. Based on these, we propose a novel algorithm: Density Threshold Algorithm (i.e. DTA), which is designed to m...
متن کاملKLEE: A Framework for Distributed Top-k Query Algorithms
This paper addresses the efficient processing of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed top-k queri...
متن کامل